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Abstract — This paper is on liomonymous distributed systems 
wliere processes are prone to crasli failures and have no initial 
knowledge of the system membership ("homonymous" means 
that several processes may have the same identifier). New 
classes of failure detectors suited to these systems are first 
defined. Among them, the classes HQ and f/E are introduced 
that are the homonymous counterparts of the classes Q, and S, 
respectively. (Recall that the pair (fi, E) defines the weakest 
failure detector to solve consensus.) Then, the paper shows how 
Hfl and f/E can be implemented in homonymous systems 
without membership knowledge (under different synchrony 
requirements). Finally, two algorithms are presented that use 
these failure detectors to solve consensus in homonymous 
asynchronous systems where there is no initial knowledge of the 
membership. One algorithm solves consensus with {HQ, HT,), 
while the other uses only Hfl, but needs a majority of correct 
processes. 

Observe that the systems with unique identifiers and anony- 
mous systems are extreme cases of homonymous systems 
from which follows that all these results also apply to these 
systems. Interestingly, the new failure detector class HQ can be 
implemented with partial synchrony, while the analogous class 
AQ defined for anonymous systems can not be implemented 
(even in synchronous systems). Hence, the paper provides us 
with the first proof showing that consensus can be solved 
in anonymous systems with only partial synchrony (and a 
majority of correct processes). 

Keywords -Agreement problem, Asynchrony, Consensus, Dis- 
tributed computability, failure detector. Homonymous system. 
Message-passing, Process crash. 

I. Introduction 

Homonymous systems Distributed computing is on mas- 
tering uncertainty created by adversaries. The first adversary 
is of course the fact that the processes are geographically 
distributed which makes impossible to instantaneously ob- 
tain a global state of the system. An adversary can be static 
(e.g., synchrony or anonymity) or dynamic (e.g., asynchrony, 
mobility, etc.). The net effect of asynchrony and failures is 
the most studied pair of adversaries. 

This paper is on agreement in crash-prone message- 
passing distributed systems. While this topic has been deeply 
investigated in the past in the context of asynchrony and 



process failures (e.g., ifTTl . |fT9l ). we additionally consider 
here that several processes can have the same identity, 
i.e., the additional static adversary that is homonymy. A 
motivation for homonymous processes in distributed systems 
can be found in fVBi where, for example, users keep their 
privacy taking their domain as their identifier (the same 
identifier is then assigned to all the users of the same 
domain). Observe that homonymy is a generalization of two 
cases: (1) having unique identifiers and (2) having the same 
identifier for all the processes (anonymity), which are the 
two extremes of homonymy. 

We also assume that the distributed system has to face 
another static adversary, which is the fact that, initially, each 
process only knows its own identity. We say that the system 
has to work without initial knowledge of the membership. 
This static adversary has been recently identified as of 
significant relevance in certain distributed contexts |fT6l . 

How to face adversaries It is well-known that lots of 
problems cannot be solved in presence of some adversaries 
(e.g., m, lfT4l . 1201 ). When considering process crash 
failures, the failure detector approach introduced in fS), lH 
(see jTS) for an introductory presentation) has proved to be 
very atttactive. It allows to enrich an otherwise too poor 
distributed system to solve a given problem P, in order to 
obtain a more powerful system in which P can be solved. 

A failure detector is a distabuted oracle that provides 
processes with additional information related to failed pro- 
cesses, and can consequently be used to enrich the com- 
putability power of asynchronous send/receive message- 
passing systems. According to the type (set of process 
identities, integers, etc.) and the quality of this information, 
several failure detector classes have been proposed. We refer 
the reader to |fT9l where classes of failure detectors suited 
to agreement and communication problems, corresponding 
failure detector-based algorithms, and additional behavioral 
assumptions that (when satisfied) allow these failure detec- 
tors to be implemented are presented. It is interesting to 
observe that none of the original failure detectors introduced 



in 191 can be implemented without initial knowledge of the 
membership lfT6l . 

Aim of the paper Agreement problems are central as 
soon as one wants to capture the essence of distributed 
computing. (If processes do not have to agree in one way or 
another, the problem we have to solve is not a distributed 
computing problem!) The aim of this paper is consequently 
to understand the type of information on failures that is 
needed when one has to solve an agreement problem in 
presence of asynchrony, process crashes, homonymy, and 
lack of initial knowledge of the membership. As consensus 
is the most central agreement problem we focus on it. 

Related work As far as we know, consensus in anonymous 
networks has been addressed first in O, ifTSll ( ifTSl considers 
different synchrony assumptions while f3Tl considers systems 
enriched with failure detectors). Connectivity requirements 
for agreement in anonymous networks is addressed in ifTSl . 

To the best of our knowledge, up to now agreement in 
homonymous systems has been addressed only in IIT2I and 
I?] . In the former paper the authors consider that, among the 
n processes, up to t of them can commit Byzantine failures. 
The system is homonymous in the sense that there are £, 
1 < £ < n, different authenticated identities, each process 
has one identity, and several processes can share the same 
identity. It is shown in that paper that £ > 3t and £ > 
are necessary and sufficient conditions for solving consensus 
in synchronous systems and partially synchronous systems, 
respectively. The latter paper JT] mainly explores consensus 
in a shared memory system with anonymous processes, 
and bounds the complexity (namely, individual write and 
step complexities) of solving consensus with the aid of an 
anonymous leader elector AH (see below). They show that 
if the system is homonymous instead of purely anonymous 
these bounds can be improved. 

The consensus problem in anonymous asynchronous 
crash-prone message-passing systems has been recently ad- 
dressed in 131 (for the first time to our knowledge). In 
such systems, processes have no identity at alfl This paper 
introduces an anonymous counterpart (denoted AP later 
in IH) of the perfect failure detector P introduced in ||9l- 
A failure detector of class AP returns an upper bound (that 
eventually becomes tight) of the current number of alive pro- 
cesses. The paper then shows that there is an inherent price 

' They must also execute the same program, because otherwise they could 
use the program (or a hash of it) as their identity. We consider that it is the 
same if processes have no identity or they have the same identity for all 
processes, since a process that lacks an identity can choose a default value 
(e.g., -L) as its identifier. 

^In this paper, when we say that a failure detector A is the counterpart of 
a failure detector B we mean that, in a classical asynchronous system (i.e., 
where each process has its own identity) enriched with a failure detector of 
class A, it is possible to design an algorithm that builds a failure detector 
of the class B and vice-versa by exchanging A and B. Said differently, 
A and B have the same computability power in a classical crash-prone 
asynchronous system. 



associated with anonymous consensus, namely, while the 
lower bound on the number of rounds in a non-anonymous 
system enriched with P is f + 1 (where t is the maximum 
number of faulty processes), it is 2t + 1 in an anonymous 
system enriched with AP. The algorithm proposed assumes 
knowledge of the parameter t. 

More general failure detectors suited to anonymous dis- 
tributed systems are presented in 01. Among other results, 
this paper introduces the anonymous counterpart AYj of the 
quorum failure detector class S ifTTl and the anonymous 
counterpart AO, of the eventual leader failure detector class 
O |i8j. It also presents the failure detector class AP which 
is the complement of AP. An important result of lH is 
the fact that relations linking failure detector classes are 
not the same in non-anonymous systems and anonymous 
systems. This is also the case if processes do not know the 
number n of processes in the system (unknown membership 
in anonymous systems). If n is unknown, the equivalence 
between AP and AP, shown in lH, does not hold anymore. 

Regarding implementability, it is stated in ||4| that AH is 
not realistic (i.e., it can not be implemented in an anonymous 
synchronous system ITOl ). If the membership is unknown, it 
is not hard to show that AP is not realistic either, applying 
similar techniques as those in |fT6l . On the other hand, while 
AP can be implemented in an anonymous synchronous 
system, it is easy to show that it cannot be implemented 
in most partially synchronous systems (e.g., in particular, in 
those with all links eventually timely). 
Contributions As mentioned, we explore the consensus 
problem in homonymous systems. Additional adversaries 
considered are asynchrony, process crashes, and lack of 
initial knowledge of the membership. We can summarize 
the main contributions of this paper as follows. 

First, the paper defines new classes of failure detectors 
suited to homonymous systems. These classes, denoted HQ, 
and HH, are shown to be homonymous counterparts of H 
and S, respectively. The interest on the latter classes is 
motivated by the fact that (S, Vl) is the weakest failure 
detector to solve consensus in crash prone asynchronous 
message-passing systems for any number of process failures 
IfTTl . The paper also investigates the relations linking HS, 
AS and S, and shows that both HVl and i/S can be 
obtained from AP m asynchronous anonymous systems. 
As a byproduct, we also introduce a new failure detector 
class denoted OHP, that is the homonymous counterpart 
of OP (the complement of OP j9i|), which we consider of 
independent interest. 

Then, the paper explores the implementability of these 
classes of failure detectors. It presents an implementation 
of OHP in homonymous message-passing systems with 
partially synchronous processes and eventually timely links. 
This algorithm does not require that the processes know the 
system membership. Since Hil can be trivially implemented 
from OHP without communication, Hil is realistic and can 



also be implemented in a partially synchronous homony- 
mous system without membership knowledge. The paper 
also presents an implementation of HI] in a synchronous 
homonymous message-passing system without membership 
knowledge. 

Finally, the paper presents two consensus algorithms 
for asynchronous homonymous systems enriched with iJO. 
Both algorithms are derived from consensus algorithms for 
anonymous systems proposed in 16] and 141, respectively. 
The main challenge, and hence, the main contribution of 
our algorithms, is to modify the original algorithms that used 
Ail to use Hil instead. In the second algorithm, also the use 
of AT, has been replaced by the use of HY,. 

The first algorithm assumes that each process knows the 
value n and that a majority of processes is correct in all 
execution^ Since, as mentioned, Hfl can be implemented 
with partial synchrony, the combination of the algorithms 
presented (to implement Hft and to solve consensus with 
Hfl) form a distributed algorithm that solves consensus 
in any homonymous system with partially synchronous 
processes, eventually timely links, and a majority of correct 
processes. Applied to anonymous systems, this result relaxes 
the known conditions to solve consensus, since previous 
algorithms were based on unrealistic failure detectors {Aft) 
or failure detectors that require a larger degree of synchrony 
(AP). 

The second consensus algorithm presented works for any 
number of process crashes, and does not need to know 
71, but assumes that the system is enriched with the pair 
of failure detectors {HT, Hfl). This algorithm, combined 
with the algorithms to implement HH and Hfl, shows 
that the consensus problem can be solved in synchronous 
homonymous systems subject to any number of crash fail- 
ures without the initial knowledge neither of the parameter 
t nor of the membership. Applied to anonymous systems, 
this result relaxes the known conditions to solve consensus 
under any number of failures, since previous algorithms used 
unrealistic detectors (Ail) or required to know t or an upper 
bound on it. 

This second consensus algorithms also forces us to restate 
the conjecture of which could be the weakest failure detector 
to solve consensus in asynchronous anonymous systems. 
The algorithm solves consensus in anonymous systems with 
a pair of detectors {HT, Hft), and we describe how it 
can be modified to solve consensus with a pair {HT, Ail). 
Additionally, as mentioned, it is shown here that HT can 
be obtained from AT, and both HT and Hfl can be 
obtained from AP. The conjecture issued in l?) was that 
(AT, Ail) © AP could be the weakest failure detector. 

^^The knowledge of n can be replaced by the knowledge of a parameter a 
such that, a > n/2 and, in all executions, at least a processes are correct. 

represents a form of composition in which the resulting failure 
detector outputs 1. for a finite time until it behaves at all processes as 
one -and the same- of the two detectors that are combined. 



Then, using the same algorithm described in ^ to combine 
the consensus algorithms for {HT, Ail) and (HT, Hil), the 
new candidate to be the weakest failure detector for consen- 
sus despite anonymity is now (HT, Ail) © {HT, Hil). 

Roadmap The paper is made up of [V] sections. Section lU 
presents the system model. Section Hill introduces failure de- 
tector classes suited to homonymous systems, and explores 
their relation with other classes and their implementability. 
Finally, Section |V] presents failure detector-based homony- 
mous consensus algorithms. 

II. System Model 

Homonymous processes Let 11 denote the set of processes 
with |n| — 71. We use id{p) to denote the identity of 
process p G 11. Different processes may have the same 
identity, i.e. p ^ q ^ id{p) ^ id{q). Two processes 
with the same identity are said to be homonymous. Let 
5 C n be any subset of processes. We define I{S) as the 
multiset (sometimes also called bag) of process identities 
in S, I{S) — {id{p) : p £ S}. Let us remember 
that, differently from a set, an element of a multiset can 
appear more than once. Hence, as I{S) may contain several 
times the same identity, we always have |/(S')| — \S\. 
The multiplicity (number of instances) of identity i in a 
multiset / is denoted multi{i). When / is clear from the 
context we will use simply mult{i). P{I) C n is used to 
denote the processes whose identity is in the multiset /, i.e., 
P{I) = {p : p G n A id{p) G /}. Every process p G 11 
knows its own identity id{p). Unless otherwise stated, a 
process p does not know the system membership /(II), nor 
the system size ri, nor any upper bound t on the number of 
faulty processes. Observe that the set 11 is a formalization 
tool that is not known by the set of processes of the system. 

Processes are asynchronous, unless otherwise stated. We 
assume that time advances at discrete steps. We assume a 
global clock whose values are the positive natural numbers, 
but processes cannot access it. Processes can fail by crash- 
ing, i.e., stop taking steps. A process that crashes in a run 
is said to be faulty and a process that is not faulty in a run 
is said to be correct. The set of correct processes is denoted 
by Correct C 11. 

Communication The processes can invoke the primitive 

broadcast{m) to send a message m to all processes of the 
system (including itself). This communication primitive is 
modeled in the following way. The network is assumed 
to have a directed link from process p to process q for 
each pair of processes p,q G 11 (p does not need to be 
different from q). Then, broadcast{m) invoked at process 
p sends one copy of message m along the link from p 
to q, for each g G 11. Unless otherwise stated, links are 
asynchronous and rehable, i.e., links neither lose messages 
nor duplicate messages nor corrupt messages nor generate 
spurious messages. If a process crashes while broadcasting 



a message, the message is received by an arbitrary subset of 
processes. 

Notation and time-related definitions The previous 
model is denoted HASm (Homonymous Asynchronous 
System). We use HPSm to denote a homonymous system 
where processes are partially synchronous and links are 
eventually timely. A process is partially synchronous if the 
time to execute a step is bounded, but the bound is unknown. 
A link is eventually timely if there is an unknown global 
stabilization time (denoted GST) after which all messages 
sent across the link are delivered in a bounded 5 time, where 
6 is unknown. Messages sent before GST can be lost or 
delivered after an arbitrary (but finite) time. 

^iS'[0] denotes the classical asynchronous system with 
unique identities and reliable channels. Finally, AAS'[0] 
denotes the Anonymous Asynchronous System model l?). 
Observe that A5[0] and ^74S'[0] are special cases (actually 
extreme cases with respect to homonymy) of HASm (an 
anonymous system can be seen as a homonymous system 
where all processes have the same default identifier _L). 

III. Failure Detectors 

In this section we define failure detectors previously pro- 
posed and the ones proposed here for homonymous systems. 
Then, relationships between these detectors are derived, and 
their implementability is explored. 

Failure detectors for classical and anonymous systems 

We briefly describe here some failure detector previously 
proposed. We start with the classes that have been defined 
for A5[0]. 

A failure detector of class E ifTTl provides each process 
p £ n with a variable trustedp which contains a set of 
process identifiers. The properties that are satisfied by these 
sets are [Liveness] Vp e Correct, 3t e N : Vt' > t, 
trustedp C I{Correct), and [Safety] Vp,q e n,VT, r' e 
N, trusted^ ("1 trusted'^' ^ 0. 

A failure detector of class J7 fS) provides each process p G 
n with a variable leaderp such that [Election] eventually 
all these variables contain the same process identifier of a 
correct process. 

The following failure detector classes have been defined 
for anonymous systems AA5[0]. 

A failure detector of class Ail lU provides each process 
p E H with a variable a_leaderp, such that [Election] there 
is a time after which, permanently, (1) there is a correct 
process whose Boolean variable is true, and (2) the Boolean 
variables of the other correct processes are false. 

A failure detector of class AP |[3| provides each process 
pen with a variable anapp such that, if anapp and 
Correct^ denote the value of this variable and the number 
of alive processes at time t, respectively, then [Safety] 
Vp e n, Vt G N,anapp > {Correct'^ \, and [Liveness] 
3t G A^, Vp G Correct, Vt' > T,anapp = \Correct\. 



A failure detector of class AT, H provides each process 
p eH with a variable a_sigmap that contains a set of pairs 
of the form {x, y). The parameter a; is a label provided by the 
failure detector, and y is an integer Let us denote a_sigma^ 
the value of variable a_sigmap at time r. Let Sa{x) = {p G 
H \ 3t £ N : {x, —) € a_sigmap}. Any failure detector of 
class AT must satisfy the following properties: 

• Validity. No set a_sigmap ever contains simultaneously 
two pairs with the same label. 

< Monotonicity. Vp G 11, Vr G : {{{x,y) G 

a_sigmap) (Vt' > r : 3y' < y : (x,y') G 

a_sigmap ). 

• Liveness. Vp G Correct, 3t £ N : Vt' > t : 3{x, y) G 
a_sigmap : {\Sa{x) fl Correct\ > y). 

. Safety. Vpi,p2 G n,VTi,T2 G N,\t{xi,yi) G 
a_sigma^\ : V(x2,y2) S a_sigma^^ : VTi C 
Sa{xi) : VT2 C Sa{x2) : ((iTil = yi) A (IT2I = 

y2)) =^ (Tinr2^0). 

Failure detectors for homonymous systems Classical 
failures detectors output a set of processes' identifiers. 
Our failures detectors extend this output to a multiset of 
processes' identifiers, due to the homonymy nature of the 
system. The following are the new failure detectors proposed 
for homonymous systems. 

A failure detector of class OHP eventually outputs for- 
ever the multiset with the identifiers of the correct processes. 
More formally, a failure detector of class OHP provides 
each process p G 11 with a variable h_trustedp, such 
that [Liveness] Vp G Correct, 3t G N : Vt' > t, 
h_trustedp = I {Correct). This failure detector OHP is 
the counterpart of OP. 

A failure detector of class HQ. eventually outputs the 
same identifier I and number c at all processes, such that 
I is the identifier of some correct process, and c is the 
number of correct processes that have this identifier i. 
More formally, a failure detector of class HVl provides 
each process p G 11 with two variables h_leaderp and 
h_multiplicity p, such that [Election] 3£ G I (Correct) , 3t G 
N : Vt' > t, Vp G Correct, h_leaderp ~ £, and 

h_multiplicityp = multj(^(jorrect)i^)- 

Any correct process p such that id{p) ~ £ is called 
a leader. Note that this failure detector does not choose 
only one leader, like in or in Aft, but a set of leaders 
with the same identifier. When all identifiers are different, 
the class Hil is equivalent to fl. Furthermore, a failure 
detector of class Hfl can be obtained from any detector D 
of class OHP without any communication (for instance, 
setting at each process p periodically h_leaderp to the 
smallest element in D.h_trustedp, and h_multiplicity ^ <— 
multD.h_trustedp{h_leaderp)). 

A failure detector of class HT provides each process 
p G n with two variables h_quorap and h_labelsp, where 
h_quorap is a set of pairs of the form [x, m) (x is a label. 



and m is a multiset such that m C /(II)) and h_labelsp 
is a set of labels. Roughly speaking, each pair (x, m) 
determines a set of quora, and the set h_labelsp of a process 
p determines in which of these sets it participates. More 
formal, let us denote h_quorap and h_labelSp the values of 
variables h_quomp and h_labelsp at time r, respectively. 
Let S{x) = {p eU \ 3t e N : X e h_labelSp}. Any failure 
detector of class HI] must satisfy the following properties: 

« Validity. No set h_quorap ever contains simultaneously 

two pairs with the same label. 
> Monotonicity. Vp £ n,VT e N,Vt' > r: 

(1) h_labelSp C h_labelSp , and (2) {{x,m) G 

h_quorap) =^ 3m' C m : (x, m') 6 h_quora!p . 
• Liveness. Vp e Correct, 3t G : Vr' > r, 3(a;,m) G 

h_quorap : m C I(S{x) fl Correct). 
. Safety. Vpi,p2 S n,VTi,T2 G A,V(xi,mi) G 

h_quorap\ : V(x2,TO2) G h_quorapl : VQi C 

5(xi),VQ2 C 5(x2),(/(Qi) = mi"A /(Q2) = 

m2) =^ (QinQ2 7^0). 

Comparing _ff S and AY,, one can observe that HY, has pairs 
(x, to) in which to is a multiset of identifiers, while AY 
uses pairs (x, y) in which y is an integer However, a more 
important difference is that, in HY, each process has two 
variables. Then, the labels that a process p has in h_quorap 
can be disconnected from those it has in h_labelsp. This 
allows for additional flexibility in HY. 



Reductions between failure detectors In this section we 
claim that it can be shown, via reductions, the relation of 
the newly defined failure detector classes with the previously 
defined classes. We use the standard form of comparing the 
relative power of failure detector classes of ||9l- A failure 
detector class X is stronger than class X' in system Y[%] if 
there is an algorithm A that emulates the output of a failure 
detector of class X' in Y[X] (i.e., system Y[%] enhanced 
with a failure detector D of class X). We also say that X' 
can be obtained from X in F[0]. Two classes are equivalent 
if this property can be shown in both directions. 

We only present here the main results. The proofs and 
additional details can be found in the Appendix. The first 
result shows that, in classical systems with unique identifiers, 
Y, HY, and AY are equivalent. 

Theorem 1. Failure detector classes Y, HY, and AY 
are equivalent in AS'[0]. Furthermore, the transformations 
between Y and HY do not require initial knowledge of the 
membership. 

In anonymous systems we have the following properties. 
Recall that an anonymous system is assumed to be a 
homonymous system in which every process has a default 
identifier 10. 

^Note that this diifers from the assumption used in (4]. 



Theorem 2. Class HY can be obtained from class AY in 
AAS[^] without communication. 

Theorem 3. Classes OHP and HY can be obtained from 
class AP in AAS\^] without communication. 
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Figure 1. Relations between failure detector classes in the models ylS'[0] 
and j4A5[0]. There is an aiTow from class X to X' if X is stronger that 
X' . SoUd arrows are relations shown by Bonnet and Raynal in (4]. Dashed 
arrows are relations shown here, while dotted arrows are trivial relations. 



IV. Implementing Failure Detectors in 
Homonymous Systems 

In this section, we show that there are algorithms that 
implement the failure detectors classes OHP and Hil 
in HPS[$] (homonymous partially synchronous system). 
We also implement the failure detector HY in HSS[$] 
(homonymous synchronous system). In all cases they do not 
need to know initially the membership. 

A. Implementation of OHP and HQ 

The algorithm of Figure |2] implements OHP (and Hil 
with trivial changes) in HPS^ where processes are partially 
synchronous, links are eventually timely, and membership is 
not known. 

Brief description of the algorithm: It is a polling-based 
algorithm that executes in rounds. At every round r, the 
Task 1 of each process p broadcasts {POLLING, r, id{p)) 
messages. After a time timeoutp, it gathers in the vari- 
able tmpp (and, hence, also in h_trustedp) a multi- 
set with the senders' identifiers ids of processes from 
{P_REPLY, r' , r" , id{p), ids) messages received with r' < 
r < r". 

Task 2 is related with the reception of POLLING 
and P_REPLY messages. When a process p receives a 
{POLLING, r,id{q)) message from process q, process p 
has to respond with as many P_REPLY as process q 
needs to receive up to round r, and not previously sent by 
process p (Lines |28||30] |. Note that the P_REPLY messages 
are piggybacked in only one message (Line [29] l. Also note 
that is in variable latest_rp[id{q)] where p holds the latest 
round broadcast to id{q). If it is the first time that process 
p receives a {POLLING,— , id) message from a process 



1 Ink 

2 hjtrustedp <— 0; // multiset of process identifiers 

3 mshipp <r- 0; // set of process identifiers 

4 rp <- 1; 

5 timeout J, <— 1; 

6 start Tasks Tl and T2; 
7 

8 Task Tl 

9 repeat forever 

10 broadcast (POLL/ArG,rp,id(p)); 

1 1 wak timeoutp time; 

12 tmpp <— 0; // tmpp is an auxiliary multiset 

13 for each {P_REPLY ,r,r' ,id(p),id{q)) received 

14 with {r < rp < r') do 

15 add one instance of id{q) to tmpp\ 

16 end for; 

17 h_trustedp <— tmpp\ 

18 rp rp + 1; 

19 end repeat; 
20 

21 Task T2 

22 upon reception of {POLLING, rq,id{q)) do 

23 if id(q) ^ mshipp then 

24 mshipp <— mshipp U {jd((ji)}; 

25 create latest_rp[idlq)]: 

26 iafest_rp[jd(g)] <— 0; 

27 end if; 

28 it latest_rp[id{q)] < Tq then 

29 broadcast {P_REPLY, latest_rp[id{q)] + 1, rq,id{q), id(p)); 

30 end if; 

31 latest_rp[id(q)] max(iatest_rp[id(q)], r^); 
32 

33 upon reception of {P_REPLY ,r,r' ,id(p), — ) with (r < rp) do 

34 timeoutp <— timeoutp + 1; 

Figure 2. Algorithm that implements OHP (code for process p). 



with identifier id, then variable latest_rp[id] is created and 
initialized to zero (Lines I23ll27] i. 

It is important to remark that, for each different 
identifier id, only one {P_REPLY, —,id{q),id) mes- 
sage is broadcast by each process q. So, if processes 
V and w with id{v) = id{w) = x broadcast two 
{POLLING, r, x) messages, then each process p only 
broadcast one {P_REPLY,r' ,r" ,x,q) message with r' < 
r <r" . Note that eventually (at least after GST time) each 
P_REPLY message sent by any process has to be received 
by all correct processes. Hence, eventually processes v and 
w will receive all P_REPLY messages generated due to 
POLLING messages. 

Finally, Lines [33]|34l of Task 2 allow process p to 
adapt the variable timeoutp to the communication latency 
and process speed. When process p receives an outdated 
{P_REPLY,r, ~,id{p), —) message (i.e., a message with 
round r less than current round rp), then it increases its 
variable timeoutp. 

Lemma 1. Given processes p 6 Correct and q ^ Correct, 
there is a round r such that p does not receive any 
(P_REPLY , p, p' ,id{p),id{q)) message from q with p' > 
r. 



Proof: There is a time t at which q stops taking 
steps. If q ever sent a {P_REPLY , —,id{p),id{q)) 
message, consider the largest x such that q sent message 
{P_REPLY,-,x,id{p),id{q)). Otherwise, let x = 0. 
Then, the claim holds for r ~ x + \. ■ 

Lemma 2. Given processes p,q E Correct, tliere is a round 
r such that, for all rounds r' > r, when p executes the loop 
of Lines \14U6\ with rp = r' , it lias received a message 
{P_REPLY, p, p' , id{p), id{q)) from q with p < r' < p'. 

Proof: Observe that, since p is correct, it will 
repeat forever the loop of Lines |9lfT9l with the 
value of rp increasing in one unit at each iter- 
ation. Hence, p will be sending forever messages 
{POLLING, -,id{p)) after GST with increasing round 
numbers, that will eventually be received by q. Then, q even- 
tually will send infinite {P_REPLY,~,~,id{p),id{q)) 
messages after GST, with increasing round numbers. Let 
{P_REPLY,x,—,id{p),id{q)) be the first such message 
sent by q after GST. Then, for each round number y > x, 
there is some message {P_REPLY , p, p' ,id{p),id{q)) sent 
by q with p < y < p', and these messages are deUvered at 
p at most S time after being sent. 

Now, assume for contradiction that for each round y > x, 
there is a round y' > y such that, when p executes the 
loop of Lines fT4lfT6l with rp ~ y', it has not received 
the message {P_REPLY , p, p' ,id{p),id{q)) from q with 
P ^ y' ^ p' ■ But, every time this happens, when the message 
is finally received, rp has been incremented in Line [18] 
and, hence, timeoutp is incremented (in Lines |33||34] |. Then, 
eventually, by some round r, the value of timeoutp will be 
greater than 25 + 7, where 7 is the maximum time that q 
takes to execute Lines 1221131] Then, p will receive message 
{P_REPLY,p,p',id{p),id{q)) with p < r' < p' before 
executing the loop of Lines [T4][T6] with rp = r', for all 
r' > r. We have reached a contradiction and the claim of 
the lemma follows. ■ 

Theorem 4. The algorithm of Figure\2\implements a failure 
detector of tlie class OHP in a system /fP5'[0] (homony- 
mous system where processes are partially synchronous and 
links are eventually timely), even if the membership is not 
known initially. 

Proof: Consider a correct process p. From Lemma [T] 
there is a round r such that p does not receive any 
{P_REPLY , p, p' , —) message with p' > r from any 
faulty process. From Lemma |2] there is a round r' such that 
for all rounds r" > r', when p executes the loop of Lines [T4]- 
[T6]with rp = r", it has received a {P_REPLY , p, p' , -) 
message with p < r" < p' from each correct process. 
Hence, for every round r" > max(r, r') when the Line [Tt] is 
executed with rp = r", the variable h_trustedp is updated 
with the multiset I{Correct). ■ 
We can obtain Hfl from the algorithm of Fig. |2] with- 



out additional communication. This can be done by sim- 
ply including, immediately after Line [17] h_leaderp -s— 
niin{h_trustedp) (i.e., the smallest identifier in h_trustedp) 
and h_multiplicity p <— multh_trusted^{h_leaderp). 

Corollary 1. The algorithm of Figure^can be changed to 
implement a failure detector of the class HVL in a system 
HPS[^] (homonymous system where processes are partially 
synchronous and links are eventually timely), even if the 
membership is not known initially. 

B. Implementation of HT, 

Figure [3] implements HH in i/5S'[0]] where processes 
are synchronous, links are timely, and membership is not 
known. 

Brief description of the algorithm It runs in syn- 
chronous steps. In each step every process p broadcasts 
a {IDENT,id{p)) message. Then, process p waits for 
{I DENT, — ) messages sent through reliable links in this 
synchronous step by alive processes. Process p gathers 
in the multiset variable msetp the identifiers id of all 
{IDENT, id) messages received. At the end of this step, 
variables h_quorap and h_labelsp are updated with the 
value of msetp. Note that for process p the label x of 
a quorum {x,m) is formed by the multiset msetp (i.e, 
X = TO = msetp). 

Theorem 5. The algorithm of Figure\3\implements a failure 
detector of the class HY, in a sy stern HSS[$] (homonymous 
synchronous systems), even if the membership is not known 
initially. 

Proof: From the definition of HH, it is enough to prove 
the following properties. 

Validity. Since h_quorap is a set, and the elements in- 
cluded in it are of the form {mset, mset) (see Line |7] in 
Figure |3]l there cannot be two pairs with the same label. 

Monotonicity. The monotonicity of hJLahelsp in Figure [3] 
holds because h_labelsp is initially empty, and each step, 
h_labelsp either grows or remains the same (see Line [S] in 
Figure[3]l. Similarly, the monotonicity of h_quorap in Figure 
[3] follows from the fact that h_quorap is initially empty, and 
any element {mset, mset) included in it is never removed 
(see Line |7] in Figure O. 

Liveness. Let s be the synchronous step in which the 
last faulty process crashed. Then, in every step s' after s 
only correct processes will execute. Consider any process 
p G Correct. In step s' will receive messages from all 
correct processes, and, hence, msetp = I{Correct). Then, 
process p includes {I (Correct), I (Correct)) in h_quorap, 
and I (Correct) in h_labelsp. Therefore, each correct pro- 
cess p is in S (I (Correct)). So, after step s, for each correct 
process p, the pair (I(Correct), I(Correct)) is in h_quorap, 
and I (Correct) = I (S (I (Correct)) fl Correct). 



1 h_labelsp •;— 0; 

2 h_quorap <— 0; 

3 for each synchronous step do 

4 broadcast (/D_BiVT,id(p)); 

5 wait for the messages sent in this synchronous step; 

6 msetp <— multiset of identifiers received in (IDENT, — ) messages; 

7 h_quorap <— h_quorap U {(msetp, msetp)} 

8 h_labelsp h_labelsp U {msetp}; 

9 end for; 

Figure 3. Algorithm to implement HTi without knowledge of membership 
(code for process p) 

Safety. Consider two pairs (xi,xi) G h_quora^^ and 
(x2,X2) e h_quorapl, for any pi,p2 € H and any ti,T2 G 
N. 

Let Ml be the set of processes from which pi received 
(IDENT, —) messages in the synchronous step in which 
(a;i,a;i) was inserted for the first time in h_quorap_^. 
Observe that Correct C Mi. Furthermore, any process 
p G S(xi) must also be in Mi (i.e., S(xi) C Mi). Also, 
xi = I(AIi), and, hence, \xi \ = \Mi\. Therefore, the only 
set Qi C S(xi) such that I(Qi) = xi is Qi = Mi. 
We define M2 similarly, and conclude that the only set 
Q2 C S(x2) such that /(Q2) = X2 is Q2 = M2. Since 
Qi n Q2 ^ Correct ^ 0, the safety property holds. ■ 

V. Solving Consensus in Homonymous Systems 

We present in this section two algorithms. One algorithm 
implements Consensus in HAS[t < n/2, Hfl], that is, in an 
homonymous asynchronous system with reliable links, using 
the failure detector Hfl, and when a majority of processes 
are correct. The other algorithm implements Consensus in 
HAS [an, HI]], that is, in an homonymous asynchronous 
system with reliable links, using the failure detector Hil 
and HY.. 

A. Implementing Consensus in HAS[t < n/2,Hfi\ 

Let us consider HAS[t < n/2, HH] where membership is 
unknown, but the number of processes is known (that is, ??). 
Let us assume a majority of correct processes (i.e., t < n/2). 
We say that a process p is a leader, if it is correct and, after 
some finite time, D .h_leader q ~ id(p) permanently for each 
correct process q. By definition of HQ,, there has to be at 
least one leader 

The algorithm of Figure |4] is derived from the algorithm 
in Figure 4 of ||6|, proposed for anonymous systems. This 
algorithm has been adapted for homonymous systems. The 
algorithm of Figure |4] uses a failure detector of class HQ, 
(instead of AVt), and a new initial leaders' coordination 
phase has been added. The purpose of this initial phase is to 
guarantee that, after a given round, all leaders propose the 
same value in each round. 

The algorithm works in rounds, and it has four phases 
(Leaders' Coordination Phase, Phase 0, Phase 1 and Phase 



1 operation propose(i;p): 

2 estlp <— Vp-, rp<— 0; 

3 start Tasks Tl and T2; 
4 

5 Task Tl 

6 repeat forever 

7 rp<r- Tp + 1; 

8 // Leaders' Coordination Pliase 

9 broadcast (COORD, id{p),rp, estlp); 

10 wait until (D .h_leader p ^ id{p))\/ 

11 {D .h_multiplicUy p messages (COORD,id{p), rp, —) received); 

12 if (some message {C'OORD,id{p), rp, — ) received) then 

13 estlp<— min{estq : id{p) = id{q)A 

14 {COORD, id{q), rp, estq) received } end if; 

15 // Phase 

16 wait until {D.h_leaderp = id{p) V ({PHO,rp,v) received); 

17 if ((PHO,rp,v) received) then estlp v end if; 

18 broadcast(P/fO, r'p, estlp); 

19 // Phase 1 

20 broadcast(P//l, rp, estlp); 

21 wait until (PHI, rp, — ) received from n ~ t processes; 

22 if (the same estimate v received from > n/2 processes) then 

23 est2p<— V 

24 else 

25 esi2p<- ± 

26 end if; 

27 // Phase 2 

28 broadcast(PH2,rp,esi2p); 

29 wait until {PH2,rp, —) received from n — t processes; 

30 let recp = {est2 : message (PH2, rp, est2) received }; 

31 if ((reCp = {«}) A (i) 7^ ±)) then 

32 broadcast {DECIDE, v); retum(D) end if; 

33 if {{recp = {v, ±}) A (i) ^ ±)) then estlp<— v end if; 

34 if {recp = {-L}) then skip end if; 

35 end repeat; 
36 

37 Task T2 

38 upon reception of {DECIDE, v) do 

39 broadcast {DECIDE, v); retum(i;) 

Figure 4. Consensus algorithm in HAS[t < n/2, HQ] (code for process 
p). It uses detector D £ HQ. 



2). Every process p begins the Leaders' Coordination phase 
broadcasting a {COORD, id{p),r, estlp) message. If pro- 
cess p considers itself a leader (querying the failure de- 
tector D of class HiJ), it has to wait until to receive 
{COORD, id{p),r, estl) messages sent by all its homony- 
mous processes (also querying the failure detector D of 
class Hfl) (Lines fTOlfTTT i. After that, process p updates its 
estimate estlp with the minimal value proposed among all 
its homonymous. Note that eventually all its homonymous 
will be leaders too. Hence, eventually all leaders will also 
choose the same minimal value in estl. 

In Phase 0, if process p considers itself a leader (querying 
the failure detector D of class Hft) (Line \T6[ . it broad- 
cast a {PHO, r, estlp) message with its estimate in estlp. 
Otherwise, process p has to update its estlp waiting until 
a {PHO,r, estli) message is received from one of the 
leaders processes / (Lines [T6lfT7] i. Note that after the Lead- 
ers' Coordination Phase, eventually each leader / broadcast 
{PHO, — , estli) messages with the same value in estli. 



The rest of the algorithm is similar to the algorithm in 
Figure 4 of |6l. We omit further details due to space restric- 
tions. The following lemmas are the key of the correctness of 
the algorithm. They show that, even having multiple leaders, 
these will eventually converge to propose the same value at 
each round. 

Lemma 3. No correct process blocks forever in the Leaders' 
Coordination Phase. 

Proof: The only line in which processes can block 
in Lines iTlfT?! is in Lines [TOlfTTI A correct process that 
is not leader does not block permanently in these lines, 
because eventually the first part of the wait condition is 
satisfied. Let us assume, for contradiction, that some leader 
blocks permanently in Line [TT] Let us consider the smallest 
round r in which some leader p blocks. By definition of 
r, each leader q eventually reaches round r, and (even if 
it blocks in r) broadcasts {COORD, id{q), r, —), where 
id{q) = id{p), in Line |9] (Observe that all processes send 
{COORD,—,—,—) messages in Line |9] even if they do 
not consider themselves as leaders.) Eventually, all these 
messages are delivered to p and D .h_multiplicity p is per- 
manently the number of leaders. Hence, the second part 
of the wait condition (Line fTTl i is satisfied. Thus, p is not 
blocked anymore, and, therefore, we reach a contradiction. 

■ 

Lemma 4. There is a round r such that at every round 
r' > r all leaders broadcast the same value in Phase of 
round r'. 

Proof: Eventually all leaders broadcast the same value 
because after some round, all leaders start Phase with 
the same value in estl. Consider a time r when all faulty 
processes have crashed and the failure detector D is stable 
(i.e., Vr' > T, Vp e Correct, D.h_leaderp = £, being 

£ G I{Correct), and D .h_multiplicityp = multii^c){^))- 
Let r be the largest round reached by any process at time 
T. Then, for any round r' > r, all leaders p have the 
same estimate estlp at the beginning of the Phase of 
round r' (Line [T6] l, or there has been a decision in a 
round smaller than To prove this, let us assume that 
no decision is reached in a round smaller than r'. Then, 
since the leaders do not block forever in any round (see 
previous paragraph 1), they execute Line |9] in round r'. 
Since the failure detector is stable, they also wait for the 
second part of the wait condition of Lines [TOlfTTI (since the 
first part is not satisfied). When any leader p executes the 
Leaders' Coordination Phase of r', it blocks in Lines [TOl - 
nn until it receives D.h_multiplicityp messages from the 
other leaders. By the stability of the Hft failure detector, 
D .h_multiplicity p is the exact number of leaders. Also, 
from the definition of r and r, no faulty process with 
identifier D .h_leader p is ahve and all the messages they sent 



correspond to rounds smaller than r' . Hence, each leader p 
will wait to receive messages from all the other leaders and 
will set est\p to the minimum from the same set of values 
(LinelH). ■ 

Theorem 6. The algorithm of Figure^solves consensus in 
HAS[t < n/2,Hn]. 

Proof: From the definition of Consensus, it is enough 
to prove the following properties. 

Validity. The variable estl is initialized with a value 
proposed by its process (Line|2]i. The value of estl may be 
updated in Lines [T4l or [17] with values of estl broadcasted 
by other processes. The variable est2 is initialized and 
updated with estl (Line [23] ) or 1. (Line [25] l. The value 
of estl may be updated in Line [33] with values of est2 
(different from _L) broadcasted by other processes. The value 
decided in Line [32] is the value of est2 that was broadcasted 
by some process. As it is not possible to decide the value _L 
(Line [32]). then the value decided has to be one of the values 
proposed by the processes. Then, the validity property holds. 

Agreement. Identical to the agreement property of Figure 

4of la. 

Termination. From Lemmas [3] and [4] after some round r, 
all leaders hold the same value v in estl when they start 
executing Phase of round r' (Line [TSI l, and they broadcast 
this same value v (Line[T8Tl. Note that it is the same situation 
as having only one leader with value v stored in estl when 
Phase is reached. Hence, as Phase starts in the same 
conditions as in the algorithm of Figure 4 of 0, the same 
proof can be used to prove the termination property. ■ 

B. Implementing Consensus in HAS[HQ, HT,] 

Figure [5] implements Consensus in HAS[Hn, HY,]. Note 
that it is a variation of the algorithm of Figure 3 of ID 
where, like in the previous case, we have added a preliminary 
phase as a barrier such that homonymous leaders eventually 
"agree" in the same estimation value estl to propose. Once 
this issue has been solved (as was proven for the previous 
algorithm), the use that this algorithm makes of the failure 
detector HI] is very similar to the use the algorithm of 
Figure 3 of H makes of the AH failure detector. 

Lemma 5. No correct process blocks forever in the repeat 
loops of Phases 1 and 2. 

Proof: Note that if a correct process decides (Line [STT i. 
then the claims follows. Consider the repeat loop of Phase 
1 (Lines l22l[38] l. Let us assume that some correct process 
is blocked forever in this loop. Then, let us consider the 
first round r in which a correct process blocks forever in 
r. Hence, all correct processes must block forever in the 
same loop in round r. Otherwise some process broadcasts a 
message {PH2, — , r, — , — , — ), and from Line [24] no correct 
process would block forever in this loop of round r. Let 
us consider a correct process p, and the pair {x, m) that 



1 operation propose(Vp): 



2 estlp <— Vp; Vp <— 0; 

3 start Tasks Tl and T2; 
4 

5 Task Tl 

6 repeat forever 

7 rp<- r-p + 1; 

8 // Leaders' Coordination Phase 

9 broadcast {COORD, id{p),rp, estlp); 

10 wait until (Dl.hjeaderp ^ id{p))\J 

11 (Dl.h_multiplicityp messages (COORD, id{p), Tp, —) received); 

12 if (some message (COORD ,id(p),rp, — ) received) then 

13 estlp-(r- minlest, : id(p) = id(q)A 

14 (COORD, id(q),Tp,estq) received } end if; 

15 // Phase 

16 wait until (Dl.h_leaderp = id(p) V ((PHQ,Tp,v) received); 

17 if {(PHO, r-p, v) received) then estlp <— v end if; 

18 broadcast(PHO,rp, estlp); 

19 // Phase 1 

20 srp-(— 1; currentjlabelsp <— D2.h_labelsp\ 

21 broadcast (PHI, id(p), r'p, srp,current_labelsp, estlp); 

22 repeat 

23 if i(PH2, -, Tp, -, -, est2) received) then 

24 est2p <— est2; exit inner repeat loop end if; 

25 if ((3(x, mset) G D2.h_quorap) A (3sr G N)A 

26 (3 set M of messages (PHI, —, Tp, sr, —, —)), such that, 

27 (y(PHl,-,-,-,cl,-) & M,x &cl)A 

28 (mset = {i : (PHI, i,-,-, -, -) G M})) then 

29 if (all msgs in M contain the same estimate v) then 

30 est2p<— V else est2p-i— ± end if; 

31 exit inner repeat loop; 

32 else if {current_labelsp ^ D2.h_labelsp)\/ 

33 ((PHI, —,rp,sr, —, —) received with sr > srp) then 

34 srp<r- srp + 1; current_labelsp<r- D2.fi_labelsp; 

35 broadcast (PHI, id(p), rp, srp, current_labelsp, estlp) 

36 end if 

37 end if 

38 end repeat; 

39 // Phase 2 

40 srp<— 1; current_labelsp<— D2.h,_labelsp; 

41 broadcast (PH2,id(p),rp, srp,current_labelsp,est2p); 

42 repeat 

43 if ((COORD, -, rp + 1, -) received) then 

44 exit inner repeat loop end if; 

45 if ((3(x, mset) G D2.ti_quorap) A (3sr G N)A 

46 (3 set M of messages (PH2, —, rp, sr, —, —)), such that, 

47 (V(P/f2, cZ,-) G M,x G d)A 

48 (mset = {i : (PH2, i,-,-, -, -) G M})) then 

49 let recp = the set of estimates contained in M; 

50 if ((recp = {v}) A {v ^ ±)) then 

51 broadcast (DECIDE, v); retum(v) end if; 

52 if ((recp = {v, ±}) A {v ^ ±)) then estlp<- v end if; 

53 if (recp = {-L}) then skip end if; 

54 exit inner repeat loop 

55 else if ((current_labelsp ^ D2.h_labelsp)\/ 

56 ((PH2, —,rp, sr, —, —) received with sr > srp)) then 

57 srp<— srp + 1; current_labelsp<r- D2.h_labelsp; 

58 broadcast (PH2, id(p), rp, srp, current_labelsp, est2p) 

59 end if 

60 end if 

61 end repeat 

62 end repeat; 
63 

64 Task T2 

65 upon reception of (DECIDE, v) do 

66 broadcast (DECIDE, v); return(v) 



Figure 5. Consensus algorithm in HAS[HQ,, HT,] (code for process p). 
It uses detectors Dl G HQ and D2 G HT,. 



guarantees the liveness property for p. Then, there is a time 
in which (x,m) € D2.h_quorap and every correct process 
q in S{x) n Correct has x G D2.h_labelsq. Note that, from 
Lines I32ll36l every change in the variable D2.h_labels of 
a process creates a new sub-round, and that all processes 
broadcast their current value of D2.h_labels in each new 
sub-round. Therefore, eventually, p will receive messages 
{PHI, —, r, sr, cl, — ) from all these processes such that x G 
cl. Hence, the condition of Lines [25ll28] is satisfied, and p 
will exit the loop of Phase I. The argument for the repeat 
loop of Phase 2 is verbatim. ■ 

Lemma 6. No two processes decide different values in the 
same round. 

Proof: Let us assume that processes pi and p2 decide 
values vi and V2 in sub-rounds sri and sr2, respectively, 
of the same round r (in Line ISTl). Let (a;i,mi) and Mi 
be the pair in D2.h_quorap_^ and the set of messages 
that satisfy the condition of Lines |45]|48] for pi. Since 
for each message {PH2, —,r, sri,cl, —) G Mi, it holds 
that xi G cl, if Qi is the set of senders of the mes- 
sages in Ml, we have that Qi C S{xi). Additionally, 
mi = {i : iPH2,i,-, -,-,-) G Mi} = I{Qi). We 
can define {x2,m2) and A/2 analogously for p2- Then, 
from the Safety Property of HT., Qi nQ2 ^ 0. Let pi G 
Qi n Q2- Then, process pi must have broadcast messages 
{PH2,id{pi),r,sri,-,vi) and {PH2,id{pi),r, sr2, - ,V2) 
(Lines ITTI and 158b. Since the estimate est2p, of pi does not 
change between sub-rounds (inner repeat loop. Lines 14211611 1. 
it must hold that vi ^ V2- From the condition of Line ISTl 
recpj — {vi} in sub-round sri and recp^ = {^2} in sub- 
round sr2, and both processes decide the same value. Hence, 
no two processes decide different values in the same round. 

■ 

Theorem 7. The algorithm of Figure\5\solves consensus in 
HAS[Hn,HY]. 

Proof: The proof of this theorem is similar to the proof 
of Theorem 5 of IS) (full version of |4 |), with the following 
changes. Observe that the Leaders' Coordination Phase and 
Phase of the algorithms in Figures |4] and |5] are the same. 
Hence, Lemmas [3] and |4] also apply to the algorithm of 
Figure |5] Then, the termination property can be proven in 
a similar way as in 151 (Lemmas 1 and 2), but using those 
two Lemmas |3] and |4] together with Lemma [5] The proof of 
the agreement property is also similar to Lemma 3 of Q 
but using Lemma |6] ■ 
The algorithm of Figure |5] can be easily transformed into 
an algorithm that solves consensus in AAS[AVI,HYj] (an 
anonymous system with detectors AVl and i/S). For that, 
given a failure detector Z33 G AVL, it is enough to remove 
the Leaders' Coordination Phase, and in Phase to replace 
{Dl.h_leaderp = id{p)) by {D3.a_leaderp). The resulting 
Phase is the same as Phase 1 in the algorithm of Figure 



3 of in, and has the same properties. 
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1 Ink 

2 hjabelsp ^ {s ; (s C /(II)) A {id{p) e s)}; 

3 h_quorap <~ 0; 

4 repeat forever 

5 q <— D .trustedp; 

6 h_quorap h_quora^ U {(g, </)}; 

7 end repeat; 

Figure 6. Algorithm to transform D G S to HE with initial knowledge 
of membership (code for process p). 

1 Init 

2 hJLabelSp 0; 

3 h_quorap <— 0; 

4 mshipp •<— 0; 

5 start tasks Tl and T2; 

6 Task Tl 

7 repeat forever 

8 broadcast (7Li£;AT,id{p)); 

9 q *r~ D .trustedp; 

10 h_quorap <— h_quorap U {(g, </)}; 

11 end repeat; 
12 

13 Task T2 

14 upon reception of {IDENT.i) do 

15 mshipp •!— mshipp U {«} 

16 h_labelsp {s : (s C mshipp) A (id{p) £ s)}; 

Figure 7. Algorithm to transform D S E to _ff E without initial knowledge 
of membership (code for process p). 

Appendix 

A. Reductions between Failure Detectors 

1) From S to iJS; We prove that, if identifiers are 
unique, a detector of class HY, can be obtained from any 
detector D of class E. 

Theorem 8. A failure detector of class HYi can be obtained 
from any detector D of class E in a system with unique 
identifiers, under either one of the following conditions: 

1) without any communication if every process initially 
knows the membership /(II), or 

2) in system AS^] (the membership does not need to be 
known initially). 

Proof: Let D. trustedp be the variable of E failure 
detector D at process p. Figures |6] and [T] present the 
algorithms to transform D into a failure detector of class HT, 
in Cases 1 and 2, respectively. In both cases, at each process 
p initially h_quorap 0, and infinitely often this variable is 
updated with the following sentences: q D .trustedp, and 
h_quorap ^ h_quorap U {{q, q)}. In Case 1, initially every 
process p sets h_labelsp {s : (s C /(II)) A {id{p) G s)} 
and it never changes it in the run. In Case 2, every process 
p initially sets h_labelsp ^ 0, and repeatedly broadcasts 
a message IDENT{id{p)). Process p also has a variable 
mshipp initially set to mshipp 0. After receiving a mes- 
sage IDENT{i), process p updates mshipp ■(— mshipp U 
{i}, and h_labelsp {s : (s C mshipp) A {id{p) G s)}. 



We prove now the properties of //E: 

• Validity. Since h_quorap is a set, and the elements 
included in it are of the form (g, q) (see Line 5 in 
Figure |6] and Line 10 in Figure |7]) there can not be 
two pairs with the same label. 

• Monotonicity. The monotonicity of h_labelsp in Figure 
|6] is obvious because it is initialized in Line 2 and 
never changes. With respect to Figure |7] hJLabelsp is 
initially empty, and it is related with the set mshipp, 
such that if mshipp grows then h_labelsp either grows 
or remains the same. Hence h_labelsp never decreases 
because mshipp never decreases (see Line 15 in Figure 
|7]i. The monotonicity of h_quorap in Figures |6] and |7] 
follows from the fact that h_quorap is initially empty, 
and any element {q, q) included in it is never removed. 

• Liveness. Consider any correct process p. In Figure |7] 
eventually. Correct C mshipp permanently (from the 
exchange of WENT messages and Line 15 of Figure 
|7]i- Then, in both algorithms eventually {s : (s C 
I {Correct)) A {id{p) G s)} C h_labelsp permanently 
(from Line 2 in Figure |6l and Line 16 in Figure [T]). 
Hence, there is a time r after which, for every set 
s C I (Correct), I{S{s)) = s and S{s) C Correct. 
The Liveness property of E guarantees that, at some 
time t' > T, the variable q is assigned a set s 
that contains only correct processes and (s, s) will be 
included in h_quorap after that. Therefore, there is a 
time after which h_quorap contains (s, s) permanently 
(from monotonicity). Since s C I{S{s) n Correct) = 
I{S{s)) = s, the property follows. 

< Safety. Consider two pairs (.ti,7tii) G h_quora^p^ and 
{x2,m2) G h_quorap\, for any pi,p2 G II and any 
Tl , T2 G N . From the management of the h_quora 
variables (Lines |3] |5] and |6] in Figure |6] and Lines 
|3] |9] and [TO] in Figure |7]i, we have that mi and m2 
are values taken from D. trustedp^ and D .trustedp^, 
respectively. Hence, the sets mi and m2 must intersect 
from the Safety property of the E failure detector D. 
Then, if I{Qi) = mi and /(Q2) = 1TL2, given that we 
are in a system with unique identifiers, Qi and Q2 must 
intersect. 

■ 

2) From //E to E.- We define now a new class of 
failure detector that will be used for reductions between the 
above failure detector classes. While the service provided by 
this detector has been already used 1211 . lU, it was never 
formally defined. The new failure detector class, denoted S, 
will only be defined for systems with unique identifiers, i.e., 
non homonymous. 

Definition 1. A failure detector of class S provides each 
process p G II, in a system with unique process identifiers, 
with a variable alivep which contains a (sorted) list of 
process identifiers. Any failure detector of class S must 



1 Ink 

2 start Tasks Tl and T2; 

3 Task Tl 

4 repeat forever 

5 broadcast {LABELS ,id{p), D.h_labelsp)\ 

6 if 3{x,m) S D.h_quorap : {identSp[x] has been created) A {m C identSp[x]) then 

7 let candidatesp = {m : {{x,m) G D.h_quorap) A {identSp[x] has been created) A (m C ideniSp[x])}; 

8 trustedp <— any m S candidatesp with smallest maxigm rankii, X .alivcp); 

9 end if; 
10 end repeat; 
11 

12 Task T2 

13 upon reception of (LABELS ,i,£) do 

14 foreacli x £ £ do 

15 if identSp [x] has not been created then create identSp [x] •<— end if; 

16 icientSp [x'] +— «dentSp[a;] U {i}; 

17 end foreach; 



Figure 9. Algorithm to transfomi D G _ffS to S in a system with unique identifiers, but without initial knowledge of membership (code for process p). 
The algorithm uses a failure detector X of class H. 



1 Init 

2 alivcp <— empty hst; 

3 start Tasks Tl and T2; 

4 Task Tl 

5 repeat forever 

6 broadcast (ALIVE, id{p)); 

7 end repeat; 
8 

9 Task T2 

10 upon reception of (ALIVE, i) do 

11 if i € alivEp then move i to the first position of alivcp 

12 else insert i in the first position of alivsp 

13 end if; 

Figure 8. Algorithm to implement a failure detector of class H without 
initial knowledge of membership in AS[0] (code for process p). 

satisfy the following property: 

• Liveness. Eventually, the identifiers of the correct 
processes are permanently in the first positions of 
alivCp. More formally, let rank{i, alivCp) denote the 
position (starting from 1) of process identifier i in 
alivCp (with rank(i, alivCp) = oo if i ^ alivCp). 
Then, Vp G Correct, 3t S N : Vt' > r, V(7 G 
Correct, rank{id(q), alive) p < \ Correct\. 

Observe that the position of the same identifier can be 
different at different processes, and can vary over time in 
the same process. From the algorithm of Figure[8] we obtain 
the following lemma. 

Lemma 7. A failure detector of class S can be implemented 
in AS[^] (an asynchronous system with unique identifiers), 
even when the membership is not known initially. 

Proof: For each process q G Correct, eventually some 
message ALIVE{id{q)) will be received at each process 
p G Correct. Then id{q) will be included in alivCp 
and never removed after that. Given any faulty process 
r, p will stop receiving messages from r by some time 



T. Then, after r process p will never receive a message 
ALIVE {id{r)) and id{r) will never be moved to (inserted 
in) the first position of alivCp. However, after r, eventually 
p will receive messages ALIVE{id{q)) from each process 
q G Correct, and each identifier id{q) will be moved to 
(or inserted in) the first position of alivCp. Then, there 
is some time t' > t such that, at all times r" > r', 
rank(id(q), alivCp ) < rank(id(r), alivCp ). Since this 
holds for all p,q £ Correct and all r ^ Correct, the claim 
follows. ■ 
We now show, using the algorithm of Figure |9] that E 
can be obtained from HY, without initial knowledge of the 
membership. 

Theorem 9. A failure detector of class E can be obtained 
from any detector D of class HT, in AS[HYi] (an asyn- 
chronous system with unique identifiers), even when the 
membership is not known initially. 

Proof: From Lemma |7] we can have a failure detector 
of class S in an asynchronous system. The logic of the 
algorithm of Figure |9] is somewhat similar to that of the 
algorithm in Figure 2 in ||4l. The condition in Line 6 
guarantees that the variable trustedp is assigned a set of 
identifiers m only if {x, m) is in h_quorap, and every 
process q whose identifier is in m has x in its set h_lahelsq 
(from the management of the sets identSp). Combining this 
condition with the safety property of i/S we guarantee the 
safety property of S. The liveness property of E holds from 
the liveness property of HYj, the choice of m done in Line 8, 
and the properties of the failure detector class S as follows. 
If p G Correct, from the liveness of i/E, eventually every 
time Line 8 is executed, there is some m G candidateSp 
with only correct processes. If the failure detector X of class 
S has already all the correct processes in the lowest ranks 
of X.alivcp (which eventually happens from its liveness 
property), then any set m in candidateSp, whose largest 



rank in X.alivcp is minimal, contains only correct processes 
(which yields the liveness of S). ■ 

Theorem [T] Failure detector classes E, HYi, and AYi 
are equivalent in AS^]. Furthermore, the transformation 
between S and HT, do not require initial knowledge of the 
membership. 

Proof of Theorem [J From Theorems [8] and |9] we have 
that S and HH are equivalent. The equivalence between S 
and AT, was shown in 0. 

3) From AT, to HT: We show now how to obtain a 
failure detector of class HT from a detector of class AT. 

Theorem |2] Class HT can be obtained from class AT in 
A AS [9] without communication. 

Proof of Theorem |2] Let D he a detector of class AT. 
The transformation can be done as follows. Let ± be the 
"default" identifier Let us denote with a multiset of r 
identifiers ±. Each process p periodically does as follows. 
For each pair {x, y) G D.a_sigmap, the label x is included 
in h_labelsp and the pair (x,!.'^) is included in h_quorap 
(replacing any pair {x, — ) that h_quorap may contain). The 
properties of HT follow trivially from the properties of AT. 

4) From TP to OHP and HT: We show here how fail- 
ure detectors of the classes OHP and HT can be obtained 
for a failure detector of class AP without communication. 

Lemma 8. A failure detector of class OHP can be obtained 
from any detector D of class AP in ^^5(0] (an anonymous 
asynchronous system) without communication. 

Proof: The transformation can be done as follows. Let 
_L be the "default" identifier Each process p periodically 
updates h_trustedp to a multiset of D.anapp identifiers _L. 
The liveness property of D guarantees the liveness property 
of OHP. m 

Lemma 9. A failure detector of class HT can be obtained 
from any detector D of class AP in AAS[%] (an anonymous 
asynchronous system) without communication. 

Proof: The transformation can be done as follows. Let 
_L be the "default" identifier Let us denote with _L'' a 
multiset of r identifiers L. Each process p periodically does 
as follows. After obtaining a value y from D.anapp, the 
label is included in h_labelsp and the pair 
is included in h_quorap. The Validity and Monotonicity of 
HT hold trivially. Liveness follows since, from the safety 
of AP, only correct processes see an output of D.anap = 
c = \Correct\, and from the liveness property all of them 
do it. Then, every correct process p eventually inserts 
in hJLahelsp and (_L'^, 1.'^) in h_quorap, and only those 
processes. Safety of HT comes from the safety property 
of AP: if, for any y and y' with y > y', \S{^^)\ = y and 
\Si±y')\ = y' (none can be larger), then S{±y) C S{±y'). 



Theorem |3] Classes OHP and HT can be obtained from 
class AP in AAS[fl}] without communication. 

Proof of Theorem |3] The proof of Theorem [3] follows 
from the two previous lemmas. 
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